Module 01 Β· Foundations

What is AutoGen?

Microsoft Research's framework for building multi-agent AI systems that can reason, collaborate, and execute code.

AutoGen is an open-source framework from Microsoft Research that lets you build systems where multiple AI agents work together to solve complex tasks. Think of it as a runtime for AI teamwork β€” agents can talk to each other, use tools, write and execute code, and ask humans for input.

Unlike workflow tools (n8n, Zapier) that execute deterministic steps, AutoGen agents reason autonomously. They decide how to solve a problem, not just follow a pre-written path.

Why does AutoGen exist?

🧩
Single LLM Limits
One LLM has a limited context window and can hallucinate. Multiple agents can verify each other, split tasks, and specialize.
πŸ”
Iterative Refinement
Agents can critique and improve each other's outputs β€” a critic agent reviewing a writer agent's code, for instance.
πŸ› οΈ
Tool Execution
Agents can write Python, run it in a sandbox, see the result, fix bugs β€” closing the loop between planning and execution.
πŸ§‘β€πŸ’»
Human in the Loop
You control how much autonomy agents have. Inject human approval at any step, or let them run fully automated.

The Big Picture

Human / Task
β†’
UserProxy Agent
↔
AssistantAgent
β†’
Code Executor
β†’
Result
πŸ’‘
AutoGen v0.4 (AgentChat) is the current stable API. It introduced a cleaner async-first design with AssistantAgent, UserProxyAgent, and GroupChat. This course uses v0.4 patterns.

AutoGen vs. The World

FrameworkParadigmBest For
AutoGenAutonomous multi-agent conversationComplex reasoning, code generation, research tasks
LangGraphStateful graph-based workflowsFine-grained control over agent state & branching
CrewAIRole-based agent teamsBusiness automation with defined roles
n8nDeterministic workflow automationIntegrating SaaS tools with predictable logic
🧠 Quick Check: What fundamentally distinguishes AutoGen from n8n?
A AutoGen is newer and faster
B AutoGen agents reason autonomously; n8n executes deterministic steps
C n8n can't use AI; AutoGen can
D AutoGen only works with OpenAI models
Module 02 Β· Foundations

Core Concepts

The building blocks: agents, conversations, termination, and the LLM config pattern.

The Two Primary Agents

Every AutoGen system is built from two fundamental agent types. Understanding these is the most important thing in this entire course.

πŸ€–
AssistantAgent
Powered by an LLM. Receives messages, reasons, and replies. It can suggest code, call functions, and generate plans. Doesn't execute code by default.
πŸ‘€
UserProxyAgent
Represents a human or an executor. Can run code that the AssistantAgent produces, then feed results back. May prompt a real human for approval.

LLM Configuration

All agents that use an LLM require a config dict. This is where you define the model and API key.

llm_config.py
import autogen

# Define your LLM config
llm_config = {
    "config_list": [
        {
            "model": "gpt-4o",
            "api_key": "sk-...",  # or use env var
        }
    ],
    "temperature": 0.1,      # low = more deterministic
    "cache_seed": 42,        # reproducible runs (optional)
}

# Or load from a JSON file (recommended for production)
config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST"
)

Conversations & Termination

A conversation begins when one agent initiates a chat with another. Agents take turns. This continues until a termination condition is met.

termination.py
# Termination methods:

# 1. Max turns
user_proxy.initiate_chat(assistant, max_turns=5)

# 2. Keyword in reply ("TERMINATE")
assistant = autogen.AssistantAgent(
    system_message="""...(your instructions)...
    When done, reply with: TERMINATE"""
)

# 3. Custom function
def my_termination(msg):
    return "task_complete" in msg["content"].lower()

user_proxy = autogen.UserProxyAgent(
    is_termination_msg=my_termination
)

Human Input Modes

ModeBehaviorUse Case
ALWAYSAsks a human to reply at every stepInteractive sessions, demos
TERMINATEAsks human only when termination is triggeredApproval gate at the end
NEVERFully autonomous β€” no human promptingProduction pipelines
⚠️
Code Execution Safety: UserProxyAgent can execute code. Always set code_execution_config with a Docker sandbox or a restricted local path in production. Never run untrusted agent code on bare metal.
🧠 Which agent actually runs Python code that the other agent writes?
A AssistantAgent
B UserProxyAgent
C Both run code equally
D Neither β€” AutoGen can't execute code
Module 03 Β· Foundations

Your First Agents

Build a working two-agent system from 15 lines of Python.

Installation

terminal
# Install AutoGen (v0.4+)
pip install pyautogen

# For code execution in Docker (recommended)
pip install pyautogen[docker]

# Set your API key
export OPENAI_API_KEY="sk-..."

Hello World: Two-Agent System

hello_autogen.py
import autogen

# 1. LLM config
llm_config = {
    "config_list": [{"model": "gpt-4o", "api_key": "sk-..."}]
}

# 2. The AI assistant agent
assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="""You are a helpful Python expert.
    When you finish the task, reply with: TERMINATE"""
)

# 3. The user proxy (executes code, no human input)
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,  # set True in production
    }
)

# 4. Start the conversation!
user_proxy.initiate_chat(
    assistant,
    message="Write a Python script that prints the first 10 Fibonacci numbers."
)

What Happens Step by Step

user_proxy sends task
β†’ "Write Fibonacci script"
↓
assistant reasons
β†’ LLM generates Python code block
↓
user_proxy executes
β†’ Runs the code, captures stdout/stderr
↓
result sent back
β†’ Output fed to assistant as next message
↓
assistant replies TERMINATE
β†’ Conversation ends
βœ…
No code to execute? If the message contains no code block, user_proxy sends a canned reply like "There is no code from the last message, provide the code." β€” this automatically nudges the assistant to write code.
🧠 In the example above, what triggers the conversation to stop?
A A max_turns limit of 10
B The code executing successfully
C The assistant including "TERMINATE" in its reply
D AutoGen detects the task is done automatically
Module 04 Β· Patterns

Conversation Patterns

Two-agent, sequential chaining, nested chats, and when to use each.

Pattern 1: Two-Agent (Default)

You've seen this. One user_proxy, one assistant. Best for focused single tasks: code generation, Q&A, analysis.

Pattern 2: Sequential Chaining

Run multiple two-agent conversations where the output of one feeds into the next. Useful for pipelines (write β†’ review β†’ deploy).

sequential.py
# Step 1: Writer agent creates a blog post
result1 = user_proxy.initiate_chat(
    writer, message="Write a blog post about RAG systems"
)
draft = result1.summary

# Step 2: Critic reviews it
result2 = user_proxy.initiate_chat(
    critic, message=f"Review this blog post:\n\n{draft}"
)

# Step 3: Editor applies improvements
result3 = user_proxy.initiate_chat(
    editor, message=f"Apply this feedback:\n{result2.summary}"
)

Pattern 3: Nested Chats

An agent can spawn a sub-conversation mid-conversation. This is powerful for tasks that require a specialist to handle a sub-task before the main flow continues.

nested.py
# Register a nested chat β€” when assistant gets a coding task,
# it spawns a full sub-conversation with a coding specialist
assistant.register_nested_chats(
    trigger=user_proxy,
    chat_queue=[
        {
            "recipient": coding_specialist,
            "message": "Please implement this: ",
            "summary_method": "last_msg",
            "max_turns": 3,
        }
    ]
)

Pattern 4: Swarm (v0.4)

AutoGen 0.4 introduced a Swarm pattern β€” agents can hand off to each other dynamically based on context, like a call-routing system.

swarm.py
from autogen import SwarmAgent, initiate_swarm_chat

# Each agent defines who it can hand off to
triage = SwarmAgent(
    name="triage",
    handoffs=["billing_agent", "tech_support", "sales"]
)

initiate_swarm_chat(
    initial_agent=triage,
    agents=[triage, billing, tech_support, sales],
    messages="I can't access my account after payment failed"
)
πŸ—ΊοΈ
Choosing a pattern: Single focused task β†’ Two-agent. Sequential pipeline β†’ Chaining. Complex orchestration with sub-tasks β†’ Nested chats. Dynamic routing by context β†’ Swarm. Team collaboration β†’ Group Chat (next module).
Module 05 Β· Patterns

Tool Use & Function Calling

Give agents real-world capabilities: web search, database queries, API calls.

By default, agents can only reason and write code. Tools give them real-world capabilities. AutoGen wraps OpenAI function calling β€” agents decide when and how to call tools.

Defining Tools with Decorators

tools.py
import autogen
from autogen import AssistantAgent, UserProxyAgent

llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]}

assistant = AssistantAgent(name="assistant", llm_config=llm_config)
user_proxy = UserProxyAgent(
    name="user_proxy", human_input_mode="NEVER",
    code_execution_config=False  # disable code exec, use tools instead
)

# ✨ Register a tool β€” function is callable by user_proxy,
#    description is shown to the LLM to know when to call it
@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Get current weather for a city")
def get_weather(city: str) -> str:
    # Call a real weather API here
    return f"Weather in {city}: 22Β°C, sunny"

@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Search the web for current info")
def web_search(query: str) -> str:
    # Integrate with Serper, Tavily, or Bing here
    return f"Results for '{query}': ..."

user_proxy.initiate_chat(
    assistant,
    message="What's the weather in Tokyo and is there a tech conference there this week?"
)

How Tool Calling Works

Task message
β†’
LLM decides to call tool
β†’
user_proxy executes fn
β†’
Result injected as tool_result
β†’
LLM uses result in reply

Tool Best Practices

Type hints matter. AutoGen uses Python type hints to generate the JSON schema for the LLM. Always annotate your function arguments and return type.

Descriptions are prompts. The description string is what the LLM reads to decide when to call your tool. Write it like a clear instruction, not a comment.

Return strings. Tool functions should return strings (or JSON-serializable types that AutoGen will stringify). The LLM reads this as text.

πŸ”§
Common integrations: Web search (Tavily, Serper), DB queries (SQLAlchemy), REST APIs (requests/httpx), file system ops, sending emails/Slack messages, calling other microservices.
🧠 When you register a tool, why does the assistant need register_for_llm and the user_proxy needs register_for_execution?
A It's just boilerplate β€” they're identical under the hood
B The LLM needs to know about the tool to decide to call it; the executor actually runs it
C Both agents run the function independently
D register_for_llm is only needed for GPT-4, not other models
Module 06 Β· Patterns

Group Chat

Orchestrate 3+ specialized agents collaborating on a shared task.

Group chat is where AutoGen really shines for complex tasks. You assemble a team of specialized agents (researcher, coder, critic, planner) and a GroupChatManager decides which agent speaks next.

Setting Up a Group Chat

group_chat.py
import autogen

llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]}

# Specialized agents with distinct system messages
planner = autogen.AssistantAgent(
    name="Planner", llm_config=llm_config,
    system_message="You break complex tasks into subtasks and assign them."
)

coder = autogen.AssistantAgent(
    name="Coder", llm_config=llm_config,
    system_message="You write high-quality Python code. No prose, just code."
)

critic = autogen.AssistantAgent(
    name="Critic", llm_config=llm_config,
    system_message="Review code for bugs, edge cases, and style. Be concise."
)

user_proxy = autogen.UserProxyAgent(
    name="User", human_input_mode="NEVER",
    code_execution_config={"work_dir": "coding"}
)

# Assemble the group chat
groupchat = autogen.GroupChat(
    agents=[user_proxy, planner, coder, critic],
    messages=[],
    max_round=12,
    speaker_selection_method="auto"  # LLM picks next speaker
)

# The manager orchestrates the conversation
manager = autogen.GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

user_proxy.initiate_chat(
    manager,
    message="Build a FastAPI endpoint that analyzes sentiment in text"
)

Speaker Selection Methods

MethodHow It WorksBest For
autoGroupChatManager (LLM) picks the most relevant agentGeneral purpose, flexible tasks
round_robinAgents take turns in orderStructured review loops
randomRandom selection each turnExploration, diversity of views
manualHuman picks each timeInteractive debugging
custom fnYour function selects the speakerComplex routing logic

Custom Speaker Selection

custom_speaker.py
def custom_speaker_selector(last_speaker, groupchat):
    # Always follow coder with critic
    if last_speaker.name == "Coder":
        return next(a for a in groupchat.agents if a.name == "Critic")
    # Always follow critic with user_proxy (to run fixed code)
    if last_speaker.name == "Critic":
        return next(a for a in groupchat.agents if a.name == "User")
    return "auto"  # let LLM decide otherwise

groupchat = autogen.GroupChat(
    ...,
    speaker_selection_method=custom_speaker_selector
)
⚠️
Token cost warning: Each agent in a group chat sees the full conversation history. With many agents and long conversations, costs scale quickly. Use max_round limits, concise system messages, and consider GroupChat(messages=[], send_introductions=False) to keep context lean.
Module 07 Β· Production

Memory & RAG

Give agents long-term memory with vector stores and retrieval-augmented generation.

By default, AutoGen agents have no memory between conversations. Every initiate_chat call starts fresh. For production systems, you need persistent memory β€” and AutoGen provides RetrieveUserProxyAgent and hooks for external vector stores.

Built-in RAG: RetrieveUserProxyAgent

rag_agent.py
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

rag_agent = RetrieveUserProxyAgent(
    name="rag_agent",
    retrieve_config={
        "task": "qa",
        "docs_path": ["./my_docs/", "https://example.com/api-docs"],
        "chunk_token_size": 2000,
        "model": "gpt-4o",
        "vector_db": "chroma",   # or "pgvector", "qdrant"
        "collection_name": "my_docs",
        "get_or_create": True,     # reuse existing collection
    },
    code_execution_config=False,
    human_input_mode="NEVER"
)

rag_agent.initiate_chat(
    assistant,
    problem="What does our API return when authentication fails?"
)

Memory Architecture Patterns

πŸ”
Vector DB (Semantic Memory)
Store embeddings of past conversations, docs, or facts. Retrieve semantically similar content. Best for: knowledge bases, long-term recall.
πŸ—„οΈ
SQL DB (Structured Memory)
Store structured facts: user preferences, task history, entities. Query with precise filters. Best for: user profiles, audit trails.
πŸ•ΈοΈ
Graph DB (Relational Memory)
Model relationships between entities. Best for: knowledge graphs, dependency tracking, multi-hop reasoning over connected facts.
⚑
In-Context (Short-term)
Conversation history in the context window. AutoGen manages this automatically. Limited by token budget β€” summarize old messages.

Custom Memory via Tools

custom_memory.py
import chromadb

chroma_client = chromadb.Client()
memory_collection = chroma_client.get_or_create_collection("agent_memory")

# Give agents tools to read/write memory
@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Save a fact to long-term memory")
def save_memory(key: str, value: str) -> str:
    memory_collection.add(
        documents=[value],
        ids=[key]
    )
    return f"Saved: {key}"

@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Search memory for relevant facts")
def search_memory(query: str) -> str:
    results = memory_collection.query(query_texts=[query], n_results=3)
    return str(results["documents"])
πŸ—οΈ
Production pattern: Use Vector DB for semantic long-term memory + SQL for structured facts + in-context summarization for recent turns. This tri-layer architecture handles the full spectrum of memory needs for production chatbots and agentic pipelines.
Module 08 Β· Production

AutoGen vs. Other Frameworks

When to use AutoGen, when to use something else, and how to combine them.

Decision Framework

ScenarioBest PickWhy
AI writes & debugs code autonomouslyAutoGenCode execution loop + multi-agent review is AutoGen's sweet spot
Research: gather, analyze, synthesizeAutoGenAutonomous reasoning + tool use + multi-agent collaboration
Strict step-by-step business workflowLangGraph / n8nDeterministic control flow with explicit state management
Role-based teams (PM, dev, QA)CrewAIFirst-class role/goal/task primitives
SaaS integration automationn8n500+ no-code connectors, trigger-based workflows
Complex RAG over many data sourcesLlamaIndex + AutoGenLlamaIndex handles retrieval; AutoGen handles agentic reasoning

Complementary Architectures

The real power comes from combining these frameworks, not choosing one.

n8n trigger (email arrives)
    β†’ LlamaIndex retrieves relevant docs
    β†’ AutoGen multi-agent reasons + writes response
    β†’ n8n sends reply via Gmail connector
    β†’ SQL DB logs outcome

AutoGen Strengths & Weaknesses

βœ…
Strengths
β€’ Autonomous code writing + execution loop
β€’ Flexible conversation patterns
β€’ Strong Microsoft ecosystem integration
β€’ Human-in-the-loop at any granularity
β€’ Active research + rapid updates
⚠️
Weaknesses
β€’ Less deterministic than graph-based tools
β€’ Token costs can escalate in group chats
β€’ Debugging multi-agent loops is hard
β€’ v0.4 API still maturing
β€’ Less enterprise-grade tooling than LangGraph

Key Takeaways

🎯
Use AutoGen when the task requires genuine AI judgment β€” where the steps can't be pre-defined. For operational automation with predictable steps, use deterministic tools. The best production systems often combine both.
πŸ“š
Next steps: Try microsoft.github.io/autogen for official docs. The AutoGen Studio UI lets you prototype multi-agent systems visually. For production, explore AutoGen + Azure AI Foundry for managed execution.
🧠 Final Check: A company wants to automatically route customer emails to billing, tech, or sales teams using AI. Which combination is best?
A Pure AutoGen group chat for everything
B Pure n8n with keyword filtering
C n8n triggers email β†’ AutoGen Swarm routes intelligently β†’ n8n sends response
D CrewAI with three specialized agents
πŸŽ“
Course Complete!
You've covered: agent types, conversation patterns, tool use, group chat, memory/RAG, and framework selection. You're ready to build production AutoGen systems.